Note: Some codes adapted from the lecture silds written by Ruth Isserlin. To compile this markdown file, please include the data and figures folder and place them at the same directory with this file.

Introduction

For this assignment, I used Transcriptional profile of human STAT1-/- fibroblasts expressing LY6E or empty control vector data GSE111958 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111958) which about if LY6E will enhance the infectivity of some viruses(Mar KB 2018). In assignment 1, I had already cleaned and normalized the data, in this assignment, I will do Differential Gene Expression analysis and Thresholded over-representation analysis.
Boxplot and density plot from A1

Perparation

Launching RStudio in Docker

First of all, use the following command to launch Rstudio in Docker.

docker run -e PASSWORD=1234 --rm -p 8787:8787 -v /Users/bfx/Documents/BCB420:/home/rstudio/docker_bcb420 risserlin/bcb420-base-image

Load the data

Use the normalized data from assignment 1.

normalized_count_data <- read.table(file=file.path("data", "GSE111958_finalized_normalized _counts.txt"),
                                    header = TRUE,sep = "\t",
                                    stringsAsFactors = FALSE,
                                    check.names=FALSE)
kable(normalized_count_data[1:5, ], type="html")
GENE_NAME SEU227 SEU235 SEU239 SLU227 SLU235 SLU239
NOC2L 36.480486 28.428988 38.926023 28.968916 25.511290 26.148589
AGRN 13.259045 9.391719 15.111525 9.837038 12.755645 9.444837
C1orf159 6.375934 6.091926 7.001937 4.841062 4.299656 8.248766
SDF4 9.201632 5.801834 11.353423 6.312745 5.876196 7.671353
B3GALT6 13.440179 13.235435 13.450049 12.548033 11.716562 10.929615

Heatmap

# Create the numerical matrix for heatmap
heatmap_matrix <- normalized_count_data[ , 2:ncol(normalized_count_data)]
rownames(heatmap_matrix) <- normalized_count_data$GENE_NAME
colnames(heatmap_matrix) <- colnames(normalized_count_data[, 2:ncol(normalized_count_data)])
# Create the heatmap
if(min(heatmap_matrix) == 0){
    heatmap_col = colorRamp2(c( 0, max(heatmap_matrix)), c( "white", "red"))
  } else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix), 0, max(heatmap_matrix)), c("blue", "white", "red"))
  }
current_heatmap <- Heatmap(as.matrix(heatmap_matrix),
                               show_row_dend = TRUE,
                               show_column_dend = TRUE, 
                               col=heatmap_col,
                               show_column_names = TRUE, 
                               show_row_names = FALSE,
                               show_heatmap_legend = TRUE
                               )
without_row_normalization_heatmap<- current_heatmap

Draw the heatmap directly, we can see that the range of the data is too large and the most value are distributed at low counts, so the whole heatmap is blue and only the top is red. heatmap without Row-normalization

# Row-normalization
heatmap_matrix <- t(scale(t(heatmap_matrix)))
if(min(heatmap_matrix) == 0){
    heatmap_col = colorRamp2(c( 0, max(heatmap_matrix)), c( "white", "red"))
  } else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix), 0, max(heatmap_matrix)), c("blue", "white", "red"))
  }
current_heatmap <- Heatmap(as.matrix(heatmap_matrix),
                               show_row_dend = TRUE,
                               show_column_dend = TRUE, 
                               col=heatmap_col,
                               show_column_names = TRUE, 
                               show_row_names = FALSE,
                               show_heatmap_legend = TRUE
                               )
row_normalization_heatmap <- current_heatmap

Scale each row and centre the around the mean, so we can narrow down the range of the data to make the heatmap more easy to read. row-normalization heatmap

Differential Gene Expression

empty_samples <- grep(colnames(normalized_count_data),
                          pattern="^SEU")
LY6E_samples <- grep(colnames(normalized_count_data),
                          pattern="^SLU")
gene_of_interest <- which(normalized_count_data$GENE_NAME == "LY6E")


# LY6E expression in empty samples
LY6E_empty_samples <- t(normalized_count_data
                       [gene_of_interest,
                         empty_samples])
colnames(LY6E_empty_samples) <- c("empty_samples")
LY6E_empty_samples
##        empty_samples
## SEU227      8.404640
## SEU235      5.801834
## SEU239      7.397527
# LY6E expression in LY6E samples
LY6E_L_samples <- t(normalized_count_data
                       [gene_of_interest,
                         LY6E_samples])
colnames(LY6E_L_samples) <- c("LY6E_samples")
LY6E_L_samples
##        LY6E_samples
## SLU227     2465.263
## SLU235     1964.477
## SLU239     1869.830
# Using a simple t.test compare STAT1 gene. 
# The null hypothesis of the two sample t-test is that there is no difference in means of each sample 
# It assumes that both stat1_LY6E_samples and stat1_empty_samples are normally distributed.
t.test(x=t(LY6E_L_samples),y=t(LY6E_empty_samples))
## 
##  Welch Two Sample t-test
## 
## data:  t(LY6E_L_samples) and t(LY6E_empty_samples)
## t = 11.328, df = 2.0001, p-value = 0.007702
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1297.825 2887.485
## sample estimates:
##   mean of x   mean of y 
## 2099.856560    7.201334

Now we can revisit the MDSPlot with limma package

rep_colors <- rainbow(3)
rep_colors<- rep(rep_colors, 2)

limma::plotMDS(heatmap_matrix, col=rep_colors)

 Define the groups, The first 3 character represent the conditions (Empty control or LY6E)

samples <- data.frame(lapply(colnames(normalized_count_data)[2:7], 
        FUN=function(x){unlist(strsplit(x, perl=TRUE, "(?=[A-Za-z])(?<=[0-9])|(?=[0-9])(?<=[A-Za-z])"))}))

colnames(samples) <- colnames(normalized_count_data)[2:7]
rownames(samples) <- c("cell_type", "rep")
samples <- data.frame(t(samples))
samples
model_design <- model.matrix(~ samples$cell_type)
kable(model_design, type="html")
(Intercept) samples$cell_typeSLU
1 0
1 0
1 0
1 1
1 1
1 1

Limma

# Create data matrix
expressionMatrix <- as.matrix(normalized_count_data[,2:7])
rownames(expressionMatrix) <- normalized_count_data$GENE_NAME
colnames(expressionMatrix) <- colnames(normalized_count_data)[2:7]
minimalSet <- ExpressionSet(assayData=expressionMatrix)

# Fit the data to the model
fit <- lmFit(minimalSet, model_design)

# Apply empircal Bayes to compute differential expression for the above described model.
fit2 <- eBayes(fit,trend=TRUE)
topfit <- topTable(fit2, 
                   coef=ncol(model_design),
                   adjust.method = "BH",
                   number = nrow(expressionMatrix))

# Merge hgnc names to topfit table
output_hits <- merge(normalized_count_data[,1],
                     topfit,
                     by.y=0,by.x=1,
                     all.y=TRUE)
# Rename the colnames
colnames(output_hits)<-c("hgnc_symbol", colnames(output_hits)[2:ncol(output_hits)])

# sort by pvalue
output_hits <- output_hits[order(output_hits$P.Value),]

kable(output_hits[1:10,],type="html")
hgnc_symbol logFC AveExpr t P.Value adj.P.Val B
5213 LY6E 2092.65523 1053.52895 14.501502 0.0000007 0.0084803 0.1230086
2949 EIF5A -97.89656 377.62546 -9.667372 0.0000144 0.0794557 -0.3053602
8132 RPS10 -75.05683 314.78603 -9.179320 0.0000208 0.0794557 -0.3808356
9378 SZRD1 -36.91134 189.28848 -8.549779 0.0000344 0.0950860 -0.4930467
968 BGN -39.04973 68.22038 -8.323150 0.0000416 0.0950860 -0.5381532
7828 RCC2 -39.92165 219.98396 -7.769751 0.0000671 0.1163271 -0.6604712
6096 NDST1 -40.52974 170.06888 -7.702479 0.0000712 0.1163271 -0.6766295
557 ARHGAP1 -19.49320 54.61903 -7.099849 0.0001243 0.1775787 -0.8356092
2067 COL1A1 -309.90818 644.96489 -6.714755 0.0001808 0.2090339 -0.9522483
3610 FST 23.54229 85.83211 6.703419 0.0001828 0.2090339 -0.9558796

 the number of genes pass the threshold p-value < 0.05

length(which(output_hits$P.Value < 0.05))
## [1] 1476

 the number of genes pass correction

length(which(output_hits$adj.P.Val < 0.05))
## [1] 1

Account for the rep variability to improve the results

model_design_rep <- model.matrix(
  ~ samples$rep + samples$cell_type)
kable(model_design_rep,type="html")
(Intercept) samples$rep235 samples$rep239 samples$cell_typeSLU
1 0 0 0
1 1 0 0
1 0 1 0
1 0 0 1
1 1 0 1
1 0 1 1

Fit the data to the new model

fit_rep <- lmFit(minimalSet, model_design_rep)

Apply empircal Bayes to compute differential expression for the new model

# Since my data is RNA-seq data, we set the parameter trend=TRUE
fit2_rep <- eBayes(fit_rep,trend=TRUE)

topfit_rep <- topTable(fit2_rep, 
                   coef=ncol(model_design_rep),
                   adjust.method = "BH",
                   number = nrow(expressionMatrix))
# Merge hgnc names to topfit table
output_hits_rep <- merge(normalized_count_data[,1],
                         topfit_rep,
                         by.y=0,by.x=1,
                         all.y=TRUE)
# Rename the colnames
colnames(output_hits_rep)<-c("hgnc_symbol", colnames(output_hits_rep)[2:ncol(output_hits)])

# Sort by pvalue
output_hits_rep <- output_hits_rep[order(output_hits_rep$P.Value),]

kable(output_hits_rep[1:10,],type="html")
hgnc_symbol logFC AveExpr t P.Value adj.P.Val B
5213 LY6E 2092.65523 1053.52895 17.020902 0.0000020 0.0229303 -0.2569220
2067 COL1A1 -309.90818 644.96489 -15.028586 0.0000043 0.0243468 -0.3220116
968 BGN -39.04973 68.22038 -9.480230 0.0000659 0.1521454 -0.7220288
8132 RPS10 -75.05683 314.78603 -9.306232 0.0000734 0.1521454 -0.7447125
2949 EIF5A -97.89656 377.62546 -9.270816 0.0000751 0.1521454 -0.7494546
2076 COL5A1 -156.39722 368.23077 -8.816718 0.0001004 0.1521454 -0.8142812
4365 HYOU1 -59.25213 284.60310 -8.587464 0.0001169 0.1521454 -0.8500763
3503 FILIP1L 36.78239 151.52313 8.579183 0.0001175 0.1521454 -0.8514109
6096 NDST1 -40.52974 170.06888 -8.551337 0.0001198 0.1521454 -0.8559201
9378 SZRD1 -36.91134 189.28848 -8.006140 0.0001747 0.1602140 -0.9514484

 the number of genes pass the threshold p-value < 0.05

length(which(output_hits_rep$P.Value < 0.05))
## [1] 1865

 the number of genes pass correction

length(which(output_hits_rep$adj.P.Val < 0.05))
## [1] 2

Draw MA plot, The point looks too few, but I checked dim(topfit) which is 11433, 6. Seems like the range of the data is too large.

differential_expression_results <- topfit[rownames(minimalSet), ]
differential_expression_status <- apply(differential_expression_results, 1, function(gene){
  if (gene["P.Value"] < 0.05) {
    if (gene["logFC"] < 0 ){
       return("Downregulated genes")
    }
    else{
      return("Upregulated genes")
    }
  } else {
    return("Not differentially expressed")
  }
})

limma::plotMD(fit2, column = ncol(fit2), cex = 0.5, status = differential_expression_status,
              main = "Differential gene expression")

Compare the results from two different models

simple_model_pvalues <- data.frame(hgnc_symbol = output_hits$hgnc_symbol,
                                   simple_pvalue=output_hits$P.Value)
rep_model_pvalues <-  data.frame(hgnc_symbol = output_hits_rep$hgnc_symbol,
                                 rep_pvalue = output_hits_rep$P.Value)
two_models_pvalues <- merge(simple_model_pvalues,
                            rep_model_pvalues,by.x=1,by.y=1)
two_models_pvalues$colour <- "black"
two_models_pvalues$colour[two_models_pvalues$simple_pvalue<0.05] <- "orange"
two_models_pvalues$colour[two_models_pvalues$rep_pvalue<0.05] <- "blue"
two_models_pvalues$colour[two_models_pvalues$simple_pvalue<0.05 & two_models_pvalues$rep_pvalue<0.05] <- "red"

# plot
plot(two_models_pvalues$simple_pvalue,two_models_pvalues$rep_pvalue,
     col = two_models_pvalues$colour,
     xlab = "simple model p-values",
     ylab ="Rep model p-values", 
     main="Simple vs Rep Limma")

For the interest gene

hgnc_symbol_of_interest <- "LY6E"

# There are too many overlap around the gene of interest, so I set others genes be transparent
two_models_pvalues$colour <- alpha("grey", 0.2)
two_models_pvalues$colour[two_models_pvalues$hgnc_symbol==hgnc_symbol_of_interest] <- "red"

plot(two_models_pvalues$simple_pvalue,two_models_pvalues$rep_pvalue,
     col = two_models_pvalues$colour,
     xlab = "Simple model p-values",
     ylab ="Rep model p-values",
      main="Simple vs Rep Limma (for interest gene)")

# Since there are too many overlap arround the gene of interest, use a point to mark the location of it.
points(two_models_pvalues[two_models_pvalues$hgnc_symbol==hgnc_symbol_of_interest, 2:3], pch=24,  col="red", cex=1.5)

Draw the Heatmap with the top hits using Limma (accounting for rep variability) which only show the p-value < 0.05 and columns ordered by cell type.

top_hits <- output_hits_rep$hgnc_symbol[output_hits_rep$P.Value<0.01]
heatmap_matrix_tophits <- t(
  scale(t(heatmap_matrix[which(rownames(heatmap_matrix) %in% top_hits),]))) 
heatmap_matrix_tophits<- heatmap_matrix_tophits[,
       c(grep(colnames(heatmap_matrix_tophits), pattern = "^SEU"), 
         grep(colnames(heatmap_matrix_tophits), pattern = "^SLU"))]
if(min(heatmap_matrix_tophits) == 0){
    heatmap_col = colorRamp2(c( 0, max(heatmap_matrix_tophits)), 
                             c( "white", "red"))
  } else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0, max(heatmap_matrix_tophits)), c("blue", "white", "red"))
  }
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits),
                  cluster_rows = TRUE,  show_row_dend = TRUE,
                  cluster_columns = FALSE,show_column_dend = FALSE,
                  col=heatmap_col,show_column_names = TRUE, 
                  show_row_names = FALSE,show_heatmap_legend = TRUE)
current_heatmap

 Obviously, the heatmap show that the gene expression has significant different for different cell type.

edgeR

filtered_data_matrix <- as.matrix(normalized_count_data[,2:7])
rownames(filtered_data_matrix) <- normalized_count_data$GENE_NAME

# Set up edgeR objects
d = DGEList(counts=filtered_data_matrix, group=samples$cell_type)

# Estimate Dispersion
d <- estimateDisp(d, model_design_rep)

# Fit the model
fit <- glmQLFit(d, model_design_rep)

# Calculate differential expression using the Quasi liklihood model
qlf.pos_vs_neg <- glmQLFTest(fit, coef='samples$cell_typeSLU')
kable(topTags(qlf.pos_vs_neg), type="html")
logFC logCPM F PValue FDR
LY6E 8.1618454 10.074037 3963.76166 0.0000000 0.0000000
BGN -0.8476159 6.163483 126.08518 0.0000034 0.0182872
COL1A1 -0.7102162 9.366846 107.71946 0.0000062 0.0182872
COL5A1 -0.6189983 8.561459 106.93024 0.0000064 0.0182872
ACTN1 -0.4486346 7.557048 63.84841 0.0000430 0.0930915
RPS5 -0.6164600 6.691154 61.61884 0.0000489 0.0930915
GCN1L1 -0.5750288 6.618209 58.83285 0.0000577 0.0942132
EIF5A -0.3756372 8.598153 54.23442 0.0000771 0.1101932
DSP -0.5444963 7.376529 51.13672 0.0000949 0.1206020
NDST1 -0.3448606 7.456516 48.37142 0.0001408 0.1493329
x
BH
x
samples$cell_typeSLU
x
glm
# get all results
qlf_output_hits <- topTags(qlf.pos_vs_neg,sort.by = "PValue",
                           n = nrow(normalized_count_data))

 the number of genes pass the threshold p-value < 0.05

length(which(qlf_output_hits$table$PValue < 0.05))
## [1] 498

 the number of genes pass correction

length(which(qlf_output_hits$table$FDR  < 0.05))
## [1] 4

Compare the results from two different models

qlf_rep_model_pvalues <- data.frame(
          hgnc_symbol = rownames(qlf_output_hits$table),
          qlf_rep_pvalue=qlf_output_hits$table$PValue)
limma_rep_model_pvalues <-  data.frame(
          hgnc_symbol = output_hits_rep$hgnc_symbol,
          limma_rep_pvalue = output_hits_rep$P.Value)
two_models_pvalues <- merge(qlf_rep_model_pvalues,
                            limma_rep_model_pvalues,
                            by.x=1,by.y=1)
two_models_pvalues$colour <- "black"
two_models_pvalues$colour[two_models_pvalues$qlf_rep_pvalue<0.05] <- "orange"
two_models_pvalues$colour[two_models_pvalues$limma_rep_pvalue<0.05] <- "blue"
two_models_pvalues$colour[two_models_pvalues$qlf_rep_pvalue<0.05 & two_models_pvalues$limma_rep_pvalue<0.05] <- "red"
# plot
plot(two_models_pvalues$qlf_rep_pvalue,
     two_models_pvalues$limma_rep_pvalue,
     col = two_models_pvalues$colour,
     xlab = "QLF rep model p-values",
     ylab ="Limma rep model p-values",
     main="QLF vs Limma ")

hgnc_symbol_of_interest <- "LY6E"

two_models_pvalues$colour <- "grey"
two_models_pvalues$colour[two_models_pvalues$hgnc_symbol==hgnc_symbol_of_interest] <- "red"
plot(two_models_pvalues$qlf_rep_pvalue,
     two_models_pvalues$limma_rep_pvalue,
     col = two_models_pvalues$colour,
     xlab = "QLF patient model p-values",
     ylab ="Limma Patient model p-values",
     main="QLF vs Limma (for interest gene)")

# Since there are too many overlaps arround the gene of interest, use a point to mark the location of it.
points(two_models_pvalues[
  two_models_pvalues$hgnc_symbol==hgnc_symbol_of_interest,2:3],
       pch=24,  col="red", cex=1.5)

Draw the Heatmap with top hit using QLF model

top_hits <- rownames(qlf_output_hits$table)[output_hits_rep$P.Value<0.05] 
heatmap_matrix_tophits <- t(
  scale(t(heatmap_matrix[which(rownames(heatmap_matrix) %in% top_hits),]))) 
heatmap_matrix_tophits<- heatmap_matrix_tophits[,
       c(grep(colnames(heatmap_matrix_tophits),pattern = "^SEU"), 
         grep(colnames(heatmap_matrix_tophits),pattern = "^SLU"))]
if(min(heatmap_matrix_tophits) == 0){
    heatmap_col = colorRamp2(c( 0, max(heatmap_matrix_tophits)), 
                             c( "white", "red"))
  } else {
    heatmap_col = colorRamp2(c(min(heatmap_matrix_tophits), 0, max(heatmap_matrix_tophits)), c("blue", "white", "red"))
  }
current_heatmap <- Heatmap(as.matrix(heatmap_matrix_tophits),
                           cluster_rows = TRUE,
                           cluster_columns = FALSE,
                               show_row_dend = TRUE,
                               show_column_dend = FALSE,
                               col=heatmap_col,
                               show_column_names = TRUE, 
                               show_row_names = FALSE,
                               show_heatmap_legend = TRUE,
                               )

current_heatmap

 Also obviously, the heatmap show that the gene expression has significant different for different cell type.

# Merge gene names with the top hits
qlf_output_hits_withgn <- merge(normalized_count_data[,1],
                                qlf_output_hits, 
                                by.x=1, by.y = 0)
colnames(qlf_output_hits_withgn)<-c("hgnc_symbol", colnames(qlf_output_hits_withgn)[2:ncol(qlf_output_hits_withgn)])

qlf_output_hits_withgn[,"rank"] <- -log(qlf_output_hits_withgn$PValue,base =10) * sign(qlf_output_hits_withgn$logFC)
qlf_output_hits_withgn <- qlf_output_hits_withgn[order(qlf_output_hits_withgn$rank),]

upregulated_genes <- qlf_output_hits_withgn$hgnc_symbol[
  which(qlf_output_hits_withgn$PValue < 0.05 
             & qlf_output_hits_withgn$logFC > 0)]

downregulated_genes <- qlf_output_hits_withgn$hgnc_symbol[
  which(qlf_output_hits_withgn$PValue < 0.05 
             & qlf_output_hits_withgn$logFC < 0)]

all_differentially_expressed_genes <- qlf_output_hits_withgn$hgnc_symbol[
  which(qlf_output_hits_withgn$PValue < 0.05)]

We can see there are 498 differentially expressed genes, the most differential expression are downregulated which is 351, and upregulated genes are only 147.

Thresholded over-representation analysis

Use g:profiler for thresholded gene set enrichment analysis. Set organism to “Homo sapiens”, significance threshold to “Benjamini-Hochberg FDR” and threshold to “0.05”.

Data sources:
GO biological process: releases/2019-07-01

Reactome annotations: ensembl classes: 2019-10-2

WikiPathways: 20190910

Exprot the genelist for upregulated, downregulated and all differentially expressed genes. Then use these genelist as query separately.

# Export the genelist to txt file for g:profiler
write.table(x=upregulated_genes,
            file=file.path("data","LY6E_upregulated_genes.txt"),sep = "\t",
            row.names = FALSE,col.names = FALSE,quote = FALSE)
write.table(x=downregulated_genes,
            file=file.path("data","LY6E_downregulated_genes.txt"),sep = "\t",
            row.names = FALSE,col.names = FALSE,quote = FALSE)
write.table(x=all_differentially_expressed_genes,
            file = file.path("data", "all_differentially_expressed_genes.txt"),sep = "\t",
            row.names = FALSE,col.names = FALSE,quote = FALSE)

All differentially expressed genes

The results of differentially expressed genes all differentially expressed genes results
The number of genesets for each data source:
GO:BP: 954
REAC: 239
WP: 16
More details for each data source:

GO:BP

the top hit is cellular component organization, the number of genes found in cellular component organization is 276
all BP

REAC

the top hit is Axon guidance, the number of gene from query found in Axon guidance is 64
all REAC

WP

the top hit is Focal Adhesion, the number of gene from query found in Focal Adhesion is 32
all BP

Upregulated genes

The results of upregulated genes upregulated results

The number of genesets for each data source:
GO:BP: 83
REAC: 12
WP: 4
More details for each data source:
GO:BP
up BP

REAC
up REAC

WP
up wp

Downregulated genes

The results of downregulated genes downregulated results

The number of genesets for each data source:
GO:BP: 918
REAC: 232
WP: 22
More details for each data source:
GO:BP

down BP

down BP

REAC
down REAC

WP
down wp

Interpretation

For the g:profiler results, the result of downregulated is very similar with the result of all differentially expressed genes. For the whole list, cellular component organization is the top hit and viral process is the third hit in GO:BP which are both related to the infectivity of viruses. The result of upregulated shows that some genes related to cell cycle, Mitochondria and ATP synthesis are upregulated.
This support the conclusions discussed in the original paper that LY6E promotes early step of the virus life cycle and enhances infectivity of some viruses.
Also I found an article “RLR-mediated antiviral innate immunity requires oxidative phosphorylation activity”, published at 2017, said mitochondria is a platform for antiviral innate immunity and antiviral innate immunity requires oxidative phosphorylation (OXPHOS) activity.(Yoshizumi 2017) oxidative phosphorylation (OXPHOS) is the top hit for upregulated genelist in GO:BP and lots of genes related to mitochondria differentially expressed.
Therefore LY6E must play some important role in early step of the virus life cycle and infectivity of some viruses.

Reference

Mar KB, Boys IN, Rinkenberger NR. 2018. “LY6E Mediates an Evolutionarily Conserved Enhancement of Virus Infection by Targeting a Late Entry Step.” Nat Commun 9 (11): 3603.

Yoshizumi, Imamura, T. 2017. “RLR-Mediated Antiviral Innate Immunity Requires Oxidative Phosphorylation Activity.” Sci Rep 7: 5379.